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Diabetic retinopathy (ResNet-34) was trained and tested for DR. Then, we develop computationally 
Residual neural network efficient and scalable methods after modifying a ResNet-34 with three additional 
residual units as a novel ResNet-n/DR. The Asia Pacific Tele-Ophthalmology 
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1. INTRODUCTION 

Recent developments in artificial intelligence (AI) have paved the road for significant advances in 
automatic diagnosis in various medical fields compared with manual methods. Computer-aided diagnosis 
systems (CADs) are a rapidly growing field in healthcare. Researchers are increasingly focusing on making it 
an influential contributor to assessment in the early detection of disease because it helps avoid disease 
exacerbations and increases the likelihood of recovery. CADs could provide features such as reducing human 
error, supporting medical decisions, and improving patient care [1]-[3]. 

Deep learning (DL) is an artificial neural network with representation learning. DL enables the 
development of high-performance AI systems in various fields, including computer vision, speech recognition, 
and natural language processing [4], [5]. DL can identify hidden patterns, extract features, and learn them by 
incorporating multiple hidden layers into a neural network [6], [7]. 

The biomedical imaging analysis that was previously dependent on using traditional machine learning 
techniques (ML) increasingly recognizes the benefits of DL networks spatially with convolutional neural 


Journal homepage: http://telkomnika.uad.ac.id 


105 O ISSN: 1693-6930 


networks (CNNs) [8]. CNN is the most efficient DL algorithm for images. A CNN employs a bank of image 
processing filters to extract various features from the images that the network considers indicator disease signs [9]. 

Extracting valuable features from medical images is critical for a correct diagnosis. Previously, 
machine learning methods were limited in their performance by the accuracy of manually extracted features. 
With CNN, manual extraction is no longer an issue; a new challenge is defining the optimal architecture and 
hyperparameters to achieve maximum performance. 

In ophthalmology, diabetic retinopathy (DR) is one of the eye diseases associated with diabetes and is 
the leading cause of vision loss [10]. Examining retina images is possible, assisting ophthalmologists in screening 
for eye diseases. Categorizing retinal eye images is an intriguing problem in computer vision with numerous 
medical applications [11]. A thorough understanding of the retinal image is critical for ophthalmologists when 
diagnosing eye diseases, if not treated earlier, leading to vision problems and blindness. In addition, early 
treatment of DR is cost-effective compared to the high cost of late or wrong diagnosis [12]. Therefore, developing 
a highly competitive deep CNN model (DCNN) for diagnosing DR is our main focus in this study. DR is a disease 
that affects the eye as a diabetic complication and influences impaired vision as a result of damage to the retina, 
the light-sensitive tissues at the bottom of the eye are required for vision [13]-[15]. Figure 1 depicts the retina in 
a normal person and an affected person with DR. 

DR progresses through several stages, depending on the appearance and development of lesions in the 
retina from their earliest stages to severity, which can lead to blindness. According to [14], [16], diabetic retinopathy 
is classified into two main types: non-proliferative diabetic retinopathy (NPDR) and proliferative diabetic 
retinopathy (PDR). Additionally, NPDR can be classified into substages as mild, moderate, or severe [17], [18]. 
Figure 2 shows the different DR stages with their features. 

Previously published research relied on a variety of methods for detecting DR and extraction features. 
Researchers have proposed various systems based on image processing, ML, and DL, demonstrating their 
effectiveness and importance in the ophthalmology field, but it is still possible to get better results. 
Ananthapadmanaban and Parthiban [19] proposed models based on data mining techniques to predict DR 
correctly. The researcher used support vector machine (SVM) and naive Bayes to detect DR early. Also, rapid 
miner is used as a data mining tool that allows forming of nearly arbitrary processes. The outcome of this study 
was that naive Bayes achieved the highest accuracy compared with the SVM. This approach has also shown 
that data mining was good in retrieving beneficial correlations even if the attributes are not directly class 
indicates that trying to predict. 

Shaban et al. [17] proposed a novel CNN architecture, 18 convolutional layers and three fully 
connected layers, a modified version of the visual geometry group 19 (VGG-19) where 2 conv layers and RELU 
were added to the middle two stages. The pre-trained weight of VGG-19 was used to initialize the parameters 
of the proposed model. This model classified DR stages into three classes normal, moderate (included mild or 
moderate DR patients), and severe (included severe NPDR or PDR). 

Graham [20] was the first winner in the Kaggle DR detection competition. At first, several 
preprocessing steps were used to remove illumination differences, including rescaling to (300x300) pixels, 
subtracting the average color, and cropping the image’s outer border. The preprocessed retinal images were 
classified using CNN (SparseConvNet). After that, the augmentation technique was used to expand the dataset, 
and that improved the training process. 

Kassani et al. [21] introduced transfer learning (TL) with modified Xception by concatenating features 
extracted from the intermediate layers. A multi-layer perceptron reserves the extracted features for training and 
making the model able to be classified DR into five stages correctly. The researchers compared the performance 
of their improved network with the original Xception, ResNet50, and InceptionV3 to prove its efficiency. Several 
pre-processing techniques used include resizing, main-pooling filter, normalization, and L1 and L2 regularization 
techniques are used too. 


Normal Retina Diabetic Retina 


Figure 1. Comparison of the normal retina against the diabetic retina 
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Islam [22] focused their efforts on image pre-processing based on a new image smoothing technique 
(Gaussian filter). Relying on TL, the researchers used the pre-trained VGG-16 model with pre-trained weights 
after adding two fully connected layers. Also, the APTOS2019 dataset was used in this research, and the 
diagnosis task includes five DR stages. The researchers demonstrated high accuracy with the proposed 
approach through particular attention to image pre-processing. 


Figure 2. DR stages [14] 


Gangwar and Ravi [23] proposed a hybrid Inception-ResNet-v2 constructed by adding a custom four 
inception block to the pre-trained Inception-ResNet model. The dataset increased with augmentation technique 
using flipping, rotation, zooming, and shifting. Messidor-1 and APTOS 2019 datasets were used to evaluate 
the hybrid model spritely after pre-processing images with blurring, cropping, and resizing. 

Al-Smadi et al. [24] introduce the TL concept with proposed modified six pre-trained networks 
(ResNet50, InceptionV3, InceptionResNet, InceptionV4, DenseNet, Xception, and EfficientNet) after adding 
a new unified classifier for all networks with four fully connect (FC) layers and dropout, batch normalization 
(BN) layers between classifier layers. Data augmentation and data-oversampling techniques were used in this 
research to overcome unbalancing data and overfitting problems. Inception-V3 based custom classifier was the 
best performance in this research. 

Alyoubi et al. [25] developed two CNN models to automatically detect DR stages, CNN512 and 
CNN299. The researcher builds CNN models from scratch with one zero padding, conv layers, max-pooling 
layers, BN layers, and FC layers with different numbers for each model. The size of input images was 
(512x512x3), and (299x299x3) for CNN512 and CNN299, respectively. The previously trained EfficientNetB 
model was also tested in this research. Contrast limited adaptive histogram equalization (CLAHE) and 
Gaussian were used to reduce the noise and to enhance fundus image contrast. Cropping, color normalization, 
and data augmentation were also used in this research. 

In contrast to the most currently available research for DR diagnosis and extracting features from raw 
fundus images, our work focused on pre-processing images first to reduce noise and highlight the disease’s 
symptoms. Then we develop custom-designed DCNN architecture with optimized parameters that outperform 
all other tested algorithms to diagnose DR in its five stages using 3-channel color fundus images (“RGB”). 
In addition to the high cost of misdiagnosis in medical fields, our research focused on using several criteria to 
ensure the efficiency of the proposed model, especially the sensitivity and specificity. This system uses 
intelligent methods to achieve high accuracy with minimal error. The remainder of the paper discusses our 
methodology and the outcomes attained. 


2. METHOD 

The CNN model’s accuracy is closely related to the number of conv layers, filters weights, and the 
number of filters in their layers. At the beginning of the search for the best architecture for DR diagnosis task in 
our previous published work [26], we used and tested three well-known deep learning architectures: VGG-19, 
Xception, and ResNet-34 as feature extraction by modifying the architectures with custom classifiers proper for 
the DR diagnosis task in four stages (normal, moderate, severe, and proliferative). The previous results help us 
to select ResNet-34 as a suitable architecture for our task. Now we develop a novel model modified from 
ResNet-34 to the DR diagnosis task into all five stages (normal, mild, moderate, severe, and proliferative). 
Various preprocessing steps are utilized to reduce the noise in retinal fundus images and augment the data that 
significantly aids in the improvement of accuracy. The next sections go into more detail about the proposed 
work and provide an in-depth look at their implementation. 
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2.1. Training dataset 

The datasets serve as the foundation for any deep learning model. Model performance depends on the 
quality and validity of the data and its ability to generalize the problem area. APTOS 2019 [27] is among the 
online retinal fundus image datasets available for use in diagnosing DR. It contains fundus images accompanied 
by labels indicating one of the five different DR stages: (O-normal, l-mild, 2-moderate, 3-severe, and 
4-proliferative). The categorized distribution of the dataset is demonstrated in Table 1. 


Table 1. The distribution of images among classes within the dataset 
Class_index DR stage APTOS 2019 dataset 


0 Normal 1805 
1 Mild 370 
2 Moderate 999 
3 Severe 193 
4 Proliferative 295 


2.2. Preprocessing 

Since the APTOS 2019 dataset was a raw data set as a challenge to researchers, images may contain 
various lighting conditions and camera resolution, resulting in low contrast between the DR signs and the 
background. In our proposed work, we are investigating methods for enhancing the visibility of DR signs and 
reducing noise through several steps, as shown in Figure 3. All images after preprocessing steps were 
normalized to keep the efficiency of models pre-trained on ImageNet. Preprocessing steps involved: 
a) Gaussian blurring: the fundus images are blurred using the Gaussian function to reduce noise. 


Gy) = ge A (1) 


Where o indicates the distribution standard deviation and x, y is the distance from the origin in the horizontal, 

vertical axis. In our experience, sigma (ø) is equal to 30. 

b) Subtract local average color (LAC): the image resulting from the previous step (blurred image) was 
subtracted from the original image. As a result, the image will have the majority of the high-frequency 
components. Then we added the resulting image after subtraction to the original. 


l= al +BG(p)*I +y (2) 


Where * denotes convolution, J denote input images, and G(p) means the Gaussian filter with a standard 

deviation. We chose 4, -4 values as the weight (a, f) for the original and blurred images and 128 for gamma (y). 

c) Image masking: we enriched the images with circle masks and dark backgrounds to facilitate cropping 
the fundus area from the unnecessary background by using the “OpenCV” python library. 

d) Cropping and resize: in this step, we cropped pixels from the right and left sides of each original image to give 
it a square_shape and removed unnecessary parts without any helpful information, and then down-sampled 
into (224x224) pixels to be fed to our DL model by following the next steps: 

— Step 1: find the height (n) and width (m) for each image. 
— Step 2: crop a part from left (CPieft) and right (cp;ignt) image, where: 


CPieft = —— (3) 


2 
CPrignt = m- (=+ n) (4) 


— Step 3: resize the resulting image from step 2 to (224x224). 
— Step 4: repeat the steps on all dataset images. 

e) Data splitting: we partitioned the dataset into three main parts, “training, validation, and testing”. 10% of 
the data was isolated for testing the final trained model. 90% remaining data was randomly divided at a 
ratio of 75:25 for the training set and validation set to be used during the training progress. 

Finally, we applied data augmentation during the training process (online), which involves producing 
additional instances of the same images utilizing geometric transformations on images while retaining their 
labels to reduce overfitting and expand dataset size. Horizontal flip, rotation, and zooming augmentation 
techniques were applied to the training set. Data preprocessing explained in Table 2. 
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Table 2. Augmentation operation settings 


Transformation Setting 

Horizontal flip The images flip horizontally around the axis 
Rotation [-45, +45] random rotations around the center 
Zooming 10 degrees 


Raw Fundus 
Image Dataset 


_————e 


«LJ Gaussian blurring aay 


+ ¥ 


LAC subtracted Data Splitting 


4 


Image Masking Data Augmentation 


+ 


Cropping & 
Resize 


Image Filtaring 
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Figure 3. An illustration of all data preprocessing steps 


2.3. ResNet-n/DR 
Based on the outcome of our previous work [26], where ResNet-34/DR achieved the highest accuracy 
due to the advantage that the residual module provides; additionally, the increase in efficiency with the increase 
in-depth, we developed a novel ResNet-n/DR as our final model for diagnosing fundus images by increasing 
the depth of the ResNet-34 network with an additional three residual modules in top of the original 
convolutional base (33 conv layers). The feature maps extracted from the final convolutional layer were 
reduced from (7x7x512) to (4x4x512) in the ResNet-n/DR model. Table 3 summarizes model layers and each 
part’s input, output size, and total parameters. Figure 4 shows the proposed architecture and ResNet-n/DR 
model parts. The construction of ResNet-n/DR is summarized: 
— Step 1: use 39 conv layers divided into 6 residual blocks to create the convolutional part. The first 5 
groups are similar to the ResNet-34 structure. 
— Step 2: build three residual blocks, each with two conv layers of 512 filters and (3x3) kernel size at the 
top of the architecture. 
— Step 3: use global average pooling (GAP) to convert the (4x4x512) feature map to a (1x1x512) feature map. 
— Step 4: a dropout layer has been added to give each node a 25% dropout probability. 
— Step 5: one FC layer with five nodes was added, followed by a SoftMax activation function to categorize 
DR 5-stage. 


Table 3. ResNet-34/DR parts and parameters 


Layer (type) Input shape Output shape _ Parameters 
39 conv layers (convolutional base) (224, 224, 3) (4, 4, 512) 35,709,760 
GAP (4, 4, 512) (512) 0 
Dropout (512) (512) 0 
Classifier part (512) (5) 2,565 
SoftMax (5) (5) 0 


The hyper-parameter represents a set of values that are used to adjust the performance of the model, 
as they are predetermined before the training process, and then monitor the performance of the models to get 
the best parameters that achieve the highest performance on models. Image size, epochs, and learning rate (LR) 
were selected experimentally. Adam optimizer, and shuffling the dataset with early stopping and model 
checkpoint is another critical component of my training process as illustrated in Table 4. 
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Table 4. Parameters for training 


Hyperparameter Setting 

Input size 224x224 

Batch size 32 

Learning rate 0.00005 

Epochs 50 

Early stopping After 10 epochs Val-loss not improved 
Optimizer Adam 

Loss function Categorical cross-entropy 
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Figure 4. Architecture of the ResNet-n/DR model with 6 conv blocks after adding 3 residual unites to the top 
of the ResNet-34 


3. RESULTS 

A novel ResNet-n/DR achieved the highest accuracy (96.3) with the lowest error rate (0.101) 
compared with ResNet-34/DR on the validation set, as illustrated in Table 5. This table clearly shows an 
improvement in network performance as a result of increasing network depth with three residual units, which 
effectively contributed to raising network efficiency while reducing error, as the advantage of the residual units 
and their ability to learn while increasing depth with preserving derivatives from fading. The ResNet-n/DR 
network training curves in Figure 5 indicate the progress of learning in our updated network. The training and 
validation curves were kept trying to get closer to all training epochs, indicating that our new network did not 
suffer from overfitting or underfitting problems and denoting a strength in the network’s performance. 

ResNet-n/DR performance on the testing set for each DR stage is summarized in Table 6. The confusion 
matrix is shown in Figure 6. We used multiple performance metrics because our dataset is unbalanced, so the 
accuracy does not always reflect the quality of the model, and due to the high cost of the false negatives (FN) and 
positives (FP) in a medical diagnosis. 


Table 5. ResNet-n/DR model performance compared with ResNet-34/DR 


Models Val accuracy Val-loss__Trainable parameters 
ResNet-34/DR 89.7 0.215 21,287,237 
ResNet-n/DR 96.3 0.101 35,712,325 


Table 6. ResNet-n/DR model performance on the testing set 


Class Specificity Precision _ Sensitivity ___F1 score 
Class 0 97.9 % 97.8% 96.7 % 97.2 % 
Class 1 97.9 % 82.5 % 89.2 % 85.7 % 
Class 2 97 % 91.9 % 91% 91.5 % 
Class 3 97 % 85.7 % 90 % 87.8 % 
Class 4 99.1 % 89.7 % 86.7 % 87.8 % 

Macro average 98.2 % 89.5 % 90.7 % 90.1 % 
Overall accuracy 93.5 % 
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Figure 5. ResNet-34/DR and ResNet-n/DR: (a) ResNet-34/DR accuracy and loss plots and (b) ResNet-n/DR 
accuracy and loss plots 
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Figure 6. The confusion matrix of ResNet-n/DR 


4. DISCUSSION 

We demonstrated, after several experiments, that the ResNet-n/DR architecture is more suitable for 
the fundus image dataset than other architectures and achieves the highest accuracy (Acc), sensitivity (Sen), 
and specificity (Sp) compared with all previous studies as shown in Table 7. Due to a variety of factors, it has 
the suitable parameters needed for training. In addition, it includes batch normalization layers before each 
nonlinear layer (ReLU) to aid in convergence. Finally, the network’s architecture incorporates residual module 
features, allowing for greater depth without the risk of fading derivatives but also boosting the performance. 

The dropout layers that were added also helped to avoid the overfitting problem. The early stopping 
technique contributed to saving training time and preventing further loss from tracking the accuracy and loss 
values in the networks. Adam was very helpful in finding optimal hyperparameters with fewer experiments 
needed. 
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Table 7. Evaluation of our ResNetn/DR model compared to the performance of research on APTOS2019 dataset 


Research DR stages Year Acc Sen Sp 
Shaban et al. [17] 3-class (0, 1+2, 3+4) 2020 88-89 87-89 94-95 
Kassani et al. [21] 5-class 2019 83.09 88.24 87 
Islam et al. [22] 5-class 2020 91.326 (loss 0.81) - - 
Gangwar and Ravi [23] 5-class 2021 82.18 - - 
Al-Smadi et al. [24] 5-class 2021 77.6-82.4 - = 
Alyoubi et al. [25] 5-class 2021 84.1 89 97.3 
Our proposed ResNet-n/DR 5-class 2022 93.5 90.7 98.2 


5. CONCLUSION 

This paper demonstrates the efficiency and feasibility of the proposed work based on deep learning to 
diagnose diabetic retinopathy by modifying the pre-trained ResNet34 model and presenting a novel 
ResNetn/DR architecture. Adding to the publicly available Kaggle dataset “APTOS2019”, our models can 
efficiently classify DR fundus images into five stages, rather than the frequently used binary diagnosis of 
normal/abnormal. That is one of the main aims of our work because the early-stage diagnosis of DR is critical 
for resolving two significant issues in ophthalmology: minimizing human error and maximizing the 
effectiveness of treatment by allowing the doctor to compare the stages of DR development. 

Conclusions were drawn based on the developed model and the obtained result. The extensive 
pre-processing steps substantially enhanced the color contrast in fundus images and removed uninteresting 
external parts of the image, which is one of the key successes in the proposed system. The ResNet-34 network 
was modified after observing that performance improved with increasing depth. A ResNet-n/DR was a more 
effective model, with approximately 93.5% accuracy and a lower error rate. The training technique employed 
in our work including data augmentation, dropout, and early stopping has achieved a relative advancement in 
DR diagnosis results. 
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